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© Polymorphic mesh network image processing system. 



© Polymorphic mesh uses physical mesh connection to form twelve useful connection patterns for each of the 
processing elements (2) making up an image processor of cellular automata under software control. Each 
processing element includes a limited mesh of interconnections to related processing elements. This provides 
for programmable choice of network configuration. The limited mesh of network interconnections is controlled by 
information stored in a register within the affected processing element The interconnection pattern controlled by 
this information is invoked by programming, or by the combination of programming and process data, so as to 
configure the network of processing elements dynamically in the desired mesh. Representative configurations 
are: 

string; mesh; tree; cube; pyramid. 
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POLYMORPHIC MESH NETWORK IMAGE PROCESSING SYSTEM 



BACKGROUND OF THE INVENTION 
1. Field of the Invention 

This invention relates to an array processor made up of a network of processing elements, and more 
particularly relates to an array processing network in which each processing element in the array of 
processing elements is equipped with a program-accessible connection control mechanism with a control- 
lable limited mesh of interconnections to related processing elements, so as to provide a programmable 
choice of network configuration, 



2. Description of the Prior Art 

The following publications are representative of the prior art 



Published Articles 

1. Sternberg, "Biomedical image processing," Computer, Jan. 1983, shows an array of cellular 
automata of identical cells connected to their nearest neighbor for iterative neighborhood processing of 
digital images. A serpentine shift register serially configures neighborhood inputs to the 3 x 3 neighborhood. 

2. Tumey et al, -Recognizing Partially Occluded Parts," IEEE Transactions on Pattern Analysis and 
Machine Intelligence, July, 1985, pp.41 0-421, shows the use of a variety of techniques including Hough 
transform, and weighted template matching. 

3. Mudge et al, "Bffciency of Feature Dependent Algorithms for the Parallel Processing of Images," 
IEEE 01 90-391 8/83/0000/0369 1983. 369-373, shows how the architecture of an image processing system 
can benefit from configuration as multiple subimage processors in which processing elements communicate 
through some form of communication network, The authors explore the difference between feature- 
dependent algorithms and feature-independent algorithms. 

4. Sternberg et al, "Industrial Morphology," shows the combination in a single system of image 
processing and of pattern recognition. 

5. D.E. Shaw, "The NON-VON Supercomputer," internal report, Columbia University* Aug. 1982, 
shows a massively parallel system with an I/O switch in each processing element and with flag registers to 
activate and deactivate individual PEs. 

6. MJ. Kimmel. FLS. Jaffe, J.R Mandeville, and MA. Lavin, "MITE* Morphic Image Transform 
Engine, An Architecture for Reconfigurable Pipelines of Neighborhood Processors, IBM RC11438, Oct 10, 
1985, shows a reconfigurable network of processing elements capable of a variety of interconnections of Pe 
to PE via bus connections under operator control. 

7. AJ. Kessler and J.H. Patel, "Reconfigurable Parallel Pipelines for Fault Tolerance," IEEE, CH1813- 
5/82/0000/011 a 1982, shows reconfigurable pipeline connection for graceful degradation. 

8. S. R. Sternberg, "Parallel Architecture for Image Processing," IEEE. CH1 51 5-6/79/0000-071 2. 
1979, shows a PE network with full connectivity. 

9. T. N. Mudge, E J. Delp, L J. Siegel and H. J. Siegel, "Image Coding Using the Mutomicroproces- 
sor System PASM," IEBE, 82CH1 761-6/82/0000/0200, 1982, shows processing element interconnection by 
an interconnection network. 

10. S. a Sternberg, "Language and Architecture for Parallel Image Processing." Pattern Recognition 
in Practice, North-Holland Publishing Co., 1980, shows a complex network of PEs and explains operation. 



Patents • U. S. Patent 4,174,514, November 13, 1979, shows an array processor with adjacent neighbor- 
hood processing elements interconnected for mutual access to the image data in overlap areas of two 
adjacent image slices. 

• U. S. Patent 4.215,401. CELLULAR DIGITAL ARRAY PROCESSOR, July 29, 1980. shows an array 
processor in which each processing element "cell" includes two accumulators arranged to connect that cell 
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(except those at the array edge) with its two neighboring cells along one axis and with its two neighboring 
cells along the orthogonal axis. 

• U. S. Patent 4,380.046. Fung. MASSIVELY PARALLEL PROCESSOR COMPUTER. April 12, 1983. shows 
an image processor which performs spatial translation by shifting or "sliding" of bits vertically or 

5 horizontally to neighboring processing elements, P register to P register, as permitted by another register, 
called Q register. Each processing element includes an arithmetic, logic and routing unit (ALRU). an I/O unit 
and a local memory unit (LMU). The ALRU constitutes three functional components, a binary counter/shift 
register subunit, a logic-slider subunit (P register) and a mask subunit (G register). 

• U. S. Patent 4.398.176, Dargel et al, DATA ANALYZER WITH COMMON DATA/INSTRUCTION BUS. 
io August 9, 1983. shows an image processing array in which each processing element includes external 

command control lines to control whether bus information is to be used as instruction or as data. Each 
processing element also includes mechanism to determine from the bit structure of an instruction whether 
the instruction is local or global, and to stop forward transmission if local. 

• U. S. Patent 4,601,055, Kent, IMAGE PROCESSOR, July 15, 1986, shows an iconic-to-iconic low level 
75 image processor with pixehby-pixel forward transformation and with pixel-by-pixel retrograde transformation. 



The prior art provides for permanent configuration of processing elements in a pipeline or other pattern 
image processing system. The prior art provides for reconfiguration of an image processing system via 
20 switching networks and busses. The prior art provides for convenient bit transfer to adjacent processing 
elements. The prior art does not however, teach nor suggest the invention, which provides for very high 
speed switching within the processing element programmable as a polymorphic mesh, so as to optimize 
dynamically the configuration to the data being processed in a configuration such as: string; mesh; tree; 
cube; pyramid. 

25 Cellular automata has been found very useful for image processing, computer vision and other 
computations in physics. All existing interconnection networks for cellular automata assume a fixed pattern 
such as a string, a mesh, a tree, a cube or a pyramid, etc.. Each pattern is good for certain types of 
computing but Is poor for computations that do not match the pattern. Since the network interconnection 
pattern is built in, it Is fixed; It can not be changed even when a mismatch is detected. The mismatch leads 

30 to poor efficiency. 

For example, an NxN mesh is an optimal interconnection for local operations in image processing, but 
its performance is poor in computing a global operation (e.g. it takes N cycles to compute MINIMUM, a 
linear complexity). On the other hand, a tree interconnection is optimal for computing MINIMUM fit takes 
only log N cycles, a logarithmic complexity) but is very inefficient in computing the local operations of an 

35 image because of the lack of the neighborhood connection. 

Besides the general inefficiency in computing when the interconnection does not match the algorithm, 
the fixed-pattern approach is inflexible in designing an algorithm. This is mainly caused by the restriction in 
data flow. For example, in the string interconnection, the data flow is one direction only, from left to right 
Such a restriction confines the algorithm domain; only the algorithms that have a "string* data flow can be 

ao benefit from the "string" network. In this regard, the fixed networks are very special-purpose and have a 
very narrow range of applications. 

Another disadvantage of the fixed interconnection pattern is that is does not support efficiently iconic 
and intermediate processing simultaneously. Such disadvantage is specific to one important application of 
the cellular automata, computer vision, in which both iconic (or image) processing and the transformation 

45 from iconic to symbolic information (called intermediate level processing) are two integral parts. A serious 
implication of this disadvantage is the I/O problem, because the image after iconic processing needs to be 
shipped outside of the network for further intermediate processing. 



so SUMMARY OF THE INVENTION 

The object of the invention is to provide program-accessible convenient high speed connectivity to each 
processing element in an array, so that processing elements may be programmably grouped in an effective 
manner without incurring the cost of the complex connections required for universal connectivity. 
55 Another object of the invention is to provide convenient programmable control of connectivity of each 
processing element in the array, so that processing elements may be programmably regrouped from time 
to time, both under adaptive control by the computer as it senses the need for optimization regrouping and 
under operator control as the operator foresees the need for optimization regrouping. 
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Another object of the invention is to provide an external memory data connection to the processing 
element, by which certain hardware may be eliminated and additional flexibility of operation may be gained. 

A feature of the invention is the use of an array of polymorphic mesh processing elements, each of 
which has processing capability embodied in an arithmetic and logic unit with memory, and also has 
5 programmable connection control capability with geographic destination connections and also logical 

connections. » 

Another feature of the invention is a provision for programmable short-circuit capability in the polymor- 
phic mesh processing element The short-circuit capability allows a series of intervening processing 3 
elements to serve simply as wire equivalents in transmitting data from a sending PE to a remote PE without 
10 cycle delay. 

Another feature of the invention is a "polymorphiomesh network,* which is a composite network of a 
conventional mesh external to each PE and an internal network within each PE, to accommodate standard 
patterns and other new useful patterns through software control so that the connection can be matched to 
the computing. Through the "polymorphfe'' feature, the network can "reshape" Itself adaptively. to allow 
is flexible algorithm design and to cover wider application spectra 

A specific feature, related to computer vision, supports the intermediate level processing (iconic to 
symbolic transformation) by the polymorphic mesh, resulting in an efficient architecture while avoiding the 
serious I/O problem. 

Another specific feature of the invention is the provision of flag registers in the processing elements for 
20 use In conditional operations including reconfiguration for adaptive self-optimization and for fail-soft 
campabilrty. 

Another feature of the invention is the provision of a limited number of multi-bit pattern registers, each 
accessing a limited number of software selectable hardwired patterns, useful in cellular automata, which can 
be formed by the polymorphic mesh. These patterns include bus, several trees, cube and pyramid, each of 
25 which is optimal for a related type of computation, selectable by a crossbar switch in response to a bit 
pattern in a selected one of the pattern registers. 

An advantage of the invention is its high throughput speed at relatively low cost, achieved by providing 
programmable limited connectivity within the processing element so as to permit optimization of array 
connection of processing elements both physically and electronically. 
30 Another advantage is that results of intermediate level processing may be used in conjunction with 
programming to provide adaptive reconfiguration efficiently as a joint function of programming and 
intermediate results of processing. This means that intermediate data can be used to invoke an adaptive 
optimization regrouping function; adaptive self-optimization and fail-soft capabilities result 

Another advantage is the simplicity and two-dimensional aspect of the invention, which permits vast 
35 numbers of interconnectable processing elements to be arrayed on a small number of chips. 

The foregoing and other objects, features and advantages of the invention will be apparent from the 
more particular description of the preferred embodiment of the invention, as illustrated in the accompanying 
drawings. 

40 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a system block diagram of an image processor made up of a network of polymorphic mesh 
processing elements, with one processing element shown in functional block diagram form. 
45 FIG. 2 is a diagram of the switching capability of the connection control mechanism CCM of a 

polymorphic mesh processing element 

FIG. 3 is a functional block diagram of the connection control mechanism of a polymorphic mesh r " 
processing element 

FIG. 4 is a diagram showing formation of linear string arrays from polymorphic mesh processing * 
so elements. 

FIG. 5 is a diagram showing formation of row trees from polymorphic mesh processing elements. 
FIG. 6 is a diagram showing an alternative view of row trees. 

FIGs 7-10 are simplified diagrams showing chip area comparison and communication distance 
improvement as a result of using polymorphic mesh processing elements. 
55 FIG. 11 is a diagram showing formulation of reverse row trees from polymorphic mesh processing 

elements. 

FIGs. 12-20 are diagrams showing representative choices of array configuration conveniently achiev- 
able by programmable interconnection of polymorphic mesh processing elements. 
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FIG. 21 is a more detailed block diagram of an individual polymorphic mesh processing element 
according to a preferred embodiment of the invention. 



5 DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION 

FIG. 1 shows a polymorphic mesh network image processing system made up of an MxM array 1 of 
processing elements 2 under control of host computer H 3. Each processing element has a limited set of 
connections; in the preferred embodiment there are four connections, one connection to each of its adjacent 

10 orthogonal (non-diagonal) neighbors. These orthogonal neighbors will be referred to as Cartesian neighbors. 
These orthogonal connections are identified for processing element 4 with the directional identifications 
NESW. The function of these connections is to present the output of any processing element directly to a 
limited number of its neighbors (four in the preferred embodiment). Overall programming control and 
housekeeping control is by host computer 3, via bus 9. 

is One processing element is shown in greater detail. Processing element 5 is shown expanded to provide 
drawing space for internal organs ALU 6, MEM 7 and COM 8 and NESW connections. 

Each processing element (PE) in the preferred embodiment Is equipped with four Cartesian connec- 
tions. These Cartesian connections are designated NESW for convenience in discussion. They connect to 
the respectively adjacent PEs in the designated directions. This simple mesh of Cartesian connections 

20 would be capable, without CCM 8, of making connection to the adjacent PE, which is a very important 
capability in image processing. This simple mesh is susceptible to convenient very large scale integration 
(VLSI) manufacturing. 

Non-Cartesian PEs are not wired for direct connection. Diagonal connection and remote connection are 
not available on wires or metallization. Such connections would make manufacturing much more difficult; 

25 bundles of wires would provide bulk distance with its inherent speed-oHight delay. 

In the preferred embodiment non-Cartesian PEs are accessed via intervening Cartesian PFs through 
programmed control of their respective CCM 8 capability. The CCM 8 pattern of connection is such that the 
input is effectively short-circuited to the desired output Connection may follow the pattern of the chess rook 
(straight Cartesian with optional extension) or the chess knight with optional extension (Cartesian X. 

so Cartesian Y ■ = remote off-Cartesian) but not the chess bishop (diagonal with optional extension). Complex 
routing patterns may be set up. Complex routing to a non-Cartesian destination may be set up to pass 
through a great number of PEs without cycle delay. 

As an alternative to the simple Cartesian connections of the preferred embodiment the limited set of 
PE-PE interconnections might also include connections to diagonally adjacent PEs. Direct connections to 

35 highly remote PEs, however, are prohibitively complex, considering that such connections can be made 
under program control according to this invention by combinations of Cartesian or other simple connections. 

FIG.2 is a diagram of the switching capability of the connection control mechanism CCM 8 as shown in 
FIG. 1. The essential function is as a switching network to connect any one of the connections NESW in the 
X crossbar with any one of the connections NESW in the Y crossbar, under control of the bit values in a 

40 pattern register. The preferred embodiment provides for selection be tween two pattern registers, with the 
selection of pattern register controlled by a bit value in a pattern selection register. Note that FIG. 2 is 
diagrammatic and does not show details of actual hardware. In the preferred embodiment not all of the 
connections available in a 4x4 matrix are necessarily used although all are indeed available. Matrix 10 is 
shown as having inputs 11 in the Y dimension with connections to conductor S12 in the X dimension. As 

46 shown, this is set up to connect N connector 13 via connection electronics 14 and 15 and intersection 16 to 
E connection 17. Different connections to SWN connections 18, 19 and 20 may be alternatively or 
simultaneously activated. The control is by pattern register 21 bit values or by pattern register 22 bit values. 
Pattern register selection is by pattern register selection register 23. Pattern registers 21 and 22 are sixteen 
bits each as shown by the slash on the connection line with the 16. Connections to connection lines 24, 25 

so and 26 are also available. Crossbar switch 10 makes any one or plurality of sixteen connections as 
controlled by the bit values in the selected one of pattern registers 21 or 22. 

Each processing element connects to one or more of its neighbors as directed by the bit values in one 
of its two pattern registers. Instantaneous change of connection can be made simply by switching control to 
the alternate pattern register. Selection of pattern registers is by the bit value in a simple binary pattern 

55 selection register. For purposes of explanation, the pattern registers may be considered standard pattern 
register and alternate pattern register, respectively accessed by 1 or by 0 value in the binary pattern 
selection register. 
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FIG. 3 is a functional block diagram of CCM 8 FIG. 1, used to implement the switching capability 
described in FIG. 2 and also to carry out certain other direction and logical function controls. A simplified 
logical connective capability 30 is shown for AND. OR, XOR (Exclusive OR), and ANDALLBIT capability. 
Other capabilities are also available but as the complexity increases the cost increases. The connection 

5 control mechanism receives flag inputs which are stored in first flag register F1 31 and second flag register 

F2 32. These flags may be passed forward and may also be used to alter the control. Shift register mask 33 1 
is provided to present outputs SRM 34 TRUE and SRM 35 COMPLEMENT, both of which are used to 
select a limited subset of the SRX 37 and SRY 39 bit values to control logical connective box 30. ; 
The X register 36 and SRX shifter 37 function to control the logical connective and also the geometrical 

w connection to a neighboring processing element Similarly, Y register 38 and shift register Y SRY 39 
function to control geometric and logical selection in the Y dimension. Usually X register 36 and Y register 
38 contain the Cartesian coordinates of a processing element in the system. Shifters SRX 37 and SRY 39, 
in conjunction with SRM 33, are used to derive any contiguous bit group of X and Y. Detailed functions of 
FIG. 3 are outlined by twelve examples, formulating twelve different patterns from the polymorphic-mesh. 

15 Each processing element is set up to carry out an arithmetical logical operation or logical transform 
operation on values presented to it or, alternatively, to perform a no-op. In addition to the operation or no- 
op, the PE is set to connect with selected neighboring processing elements. One such connection to 
neighbor processing elements is to serve as a short circuit The short circuit connection is essentially 
instantaneous. Connection occurs at the speed of electricity (speed of light) rather than with a one-cycle 

20 delay as is common with ordinary operations, including no-op. It is thus possible to bypass several 
processing elements in order to make the desired connection to a processing element which is a non- 
adjacent Cartesian neighbor, it Is possible to make a zigzag move, to make connection to a remote 
processing element which is neither in the same column nor in the same row. In this fashion, even though 
the array connections are without diagonals, connection can be made to adjacent diagonal or remote 

25 diagonal or off-diagonal remote processing elements. 

While additional explanation wilt be made, it should be understood at this point that the polymorphic 
mesh image processing system, properly program controlled, can be configured to carry out a processing 
effort involving a complex interconnection of processing elements. Each processing element does its 
appropriate arithmetic or logical operation or transform operation or no-op, with respect to information 

30 provided to it or the processing element may be short-circuited so as to be bypassed without cycle delay. 

A certain amount of sophistication in connections and in connection logic might be available as a 
function of the number of bit values in the pattern registers, and as a function of the number of pattern 
registers, of the connection control mechanism. But such sophistication would be costly, because of the 
large number of replications required by the large number of processing elements. In order to keep costs of 

35 the processing element down, costs being measured in terms not only of money but of complexity and path 
length, the preferred embodiment has a limited repertoire of optimized connections. This repertoire is 
depicted in FIGs 4 through 20. 

FIG. 4 shows the formulation of linear arrays. There is a west to east, linear array 41, and there is a 
north to south linear array 42. FIGs 5 and 6 show two techniques for forming row trees. In the first 

40 technique, X strings 45, 46, 47 and 48 through 51 are shown in FIG. 5. These X strings of row trees are of 
different lengths. FIG. 6 shows a tree in its more traditional presentation as tree con nection 52 In this 
traditional tree configuration seven inputs filter out to a single processing element in four steps. 

FIG. 7-10 show a chip area comparison and communication distance improvement as a result of using 
polymorphic mesh processing elements. The chip area for polymorphic, mesh 61 as shown In FIG. 7 is 

45 quite compact as contrasted to pattern 62 of an orthogonal tree, Fig. 9. The average communication 
distance in a 3x3 window, as shown in FIG. 8 pattern 63, is 1.5 using the polymorphic mesh. The same 3x3 
window 64 in FIG. 10 has an average communication distance of 2.125. 

FIG. 11 is a diagram showing formulation of reverse row trees from polymorphic mesh processing 
elements. This is very similar to row trees shown in FIG. 5. While a row tree is used to collect information * 

so from its leaves, a reverse row tree is to distribute information to its leaves. 

FIGS. 12-20 show selected choices of array configuration conveniently achievable by programmable 
interconnection of polymorphic mesh processing elements. Certain ones of the processing elements in the 
figures are shown as squared circles* to indicate processing operations, as contrasted to intervening 
processing elements used simply for pass-through, which are shown as simple circles. Note that each PE 

55 can perform both the pass-through function and a processing operation, as programmed. Each pattern is 
reduced to a set of sixteen bit values and presented via the appropriate pattern register. 



6 



0 257 581 



FIG. 21 is a detail of the preferred embodiment polymorphic mesh processing element according to the 
invention. Note that much of the mechanism shown is relatively standard. This relatively standard hardware 
is shown outside the broken line box 93. The standard hardware features ALU 94 which provides outputs 1 
and 2 as well as appropriate input multiplexes. Inputs may come via instruction terminal 94 and also from 

5 registers NSE and W 95 which in turn are fed from memory 1 M1 , memory 2 M2, ALU output 1 or output 2. 
The local memory at 101 is available for appropriate purposes of computation and housekeeping related to 
the ALU. The local memory 101 can be extended to the external memory whose content is fed into the 
processing element via an external memory data wire ENID 102. The local and external memory content are 
multiplexed via multiplexer 103 and connected to M1 or/and M2. Output signals out 1 and out 2 are fed 

io back to the same processing element 93 and may be also fed to other processing element as selected by 
theCCM. 

The EMD connection and plurality of buses M1 and M2 provide for an external memory data connection 
to the internal memory means, processing means and connection control mechanism, whereby the 
processing element may be operating with local memory and external memory simultaneously. 

75 The connection at EMD 102 is important in that it permits direct connection of the individual PE to an 
external memory which may be in the host H 3 or may be a standalone memory not shown. This EMD 
connection, not present in ordinary processing element in array processors, makes it possible to provide the 
equivalent of FIG. 3 from an external memory. The EMD connection also makes possible a supplement to 
the hardware of FIG. 3, as well as great flexibility of operation and setup of polymorphic-mesh processing 

20 element networks. 

The switching capability implemented in Ra 3 via its directional and logic function control is related to 
external memory and E^ID 102. A wide spectrum of implementation means for RG. 3 is possible, ranging 
from having all functions illustrated in RG. 3 resident in each and every processing element to having all 
these functions resident external to all PEs but delivering the resultant condition via connection EMD 102 to 

25 a pattern register selection register Rp 99. 

Two pattern registers, PRO 97 and PR1 98, allow instantaneous switching from one connection pattern 
to the other without loss of any instruction cycle. The instantaneous switching is controlled by a one-bit 
register, the pattern register selection register (Rp 99). When it is determined that the values in one of the 
two pattern registers are no longer necessary for use, that pattern register can be loaded with a new 

30 pattern. Such loading can be free, that Is, can be earned out at the same time as the concurrent ALU 
operation. It must be remembered that the processing elements in image processing systems are normally 
one-bit processing elements, and in any case are relatively simple. It is a relatively significant effort to load 
the pattern registers 97 and 98, which are 16 bits each. Pattern register selection register 99 must also be 
loaded, but this is a single bit 

35 There are occasions when it is necessary to dose down processing to reload pattern registers 97 and 
98. If this occurs it would normally take 32 cycles to load the pattern registers and a 33rd cycle to load the 
pattern register selection register 99. In many cases, however, there need be no time used specifically for 
loading the pattern registers. In the case of appropriate instructions, known to be appropriate, the operator 
can load one bit into the pattern register 97 or one bit into the pattern register 98 or one bit into the pattern 

40 register selection register 99. During the cycle in which ALU 96 is carrying out an arithmetic or logical 
operation over a period of time during which the arith-metic and logical unit 96 is operating at full capacity, 
it thus maybe possible to reload one or all of the registers in CCM. Such freeloading is by a path from 
Instruction in set 94 via multiplexes in ALU 96 and feedback from out 1 or out 2, assuming appropriate 
setup of instruction gate 100 to carry out the loading function. 

45 Note that the selected sixteen bit values from pattern registers 97 or 98 are to control the detailed 
setting of the 4x4 crossbar switch 104. Its detail can be referred to RG. 2. 

In a typical operation the processing element will be either short circuited or active. If short circuited, a 
patterrt register such as register 97, set for short circuit connection, takes over and carries out the short 
circuit connection via that processing element to one or more other processing elements. In the case when 

so the processing element is active pattern register selection register 99 would be set for action and would 
switch control from standard pattern register 97, setter bypass via short circuit to alternate pattern register 
98, set for directing inputs and outputs for the activity. 

These activities will be explained in the following paragraphs. 

The polymorphic mesh, shown in RGS. 1-3, is capable of a number of patterns limited by the 
55 complexity of it connection control mechanism CCM. The CCM pattern repertoire, being the union of these 
patterns, is optimal for the union of the computing types covered separately. 
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One control algorithm is described in the invention for each pattern respectively. All control algorithms 
are simple to implement and most importantly use the same set of hardware to generate the desired pattern 
on-line. 

A hardware mechanism is depicted in the invention to carry out the formation of all patterns in a 
5 systematical and consistent way. The mechanism is simple to implement and very suitable for VLSI 
implementation. 

As a most distinguished feature of the polymorphic-mesh, many algorithms of linear complexity (O(N)) 
are reduced to logarithm complexity (0(log N» v a theoretical optimal. Therefore the N/logN speedup is 
gained by the architectural novelty; for a network of 1024x1024, the speedup is 100. 
10 Specific to computer vision, after the iconic processing by one pattern (mesh), the resulting image is 
not shipped outside of the network. Rather, it is further processed by another pattern (e.g. tree) to transform 
the iconic information to symbolic information (e.g. how many pixels whose values are greater than 133?). 
Because of the polymorphic capability, the data do not have to be output consequently the I/O rate is 
significantly reduced (e.g. five orders of magnitude reduction for the "how many" example in a 1024x1024 
75 image). The saving from the I/O reduction contributes to the speedup on top of the speedup due to 
computing in a compound manner. 

Another pattern, Diagonal-Span-tree, can be formed by the polymorphic-mesh to facilitate the comput- 
ing of Ax+By + C in logarithm time where A.B and C are constant and (x, y) is the coordinate of a pixel. 
This capability is useful in both computer graphics and computer vision. For computer graphics, it is useful 
20 in display convex polygon, in creating shadow, in clipping, in drawing spheres, in computing adaptive 
histogram equalization, in texture mapping and anti-aliasing. For computer vision, such a capability is useful 
in generating a fine mask, a band mask and a polygonal mask. It is also useful in computing Fast Hough 
Transform and its inverse for detecting lines in a noisy image and other applications. 

The preferred embodiment includes a hardware mechanism to generate twelve useful cellular automata 
25 patterns and twelve control algorithms, one for each pattern, to reshape the polymorphic-mesh into the 
corresponding pattern under software control. The following sections describe the polymorphic hardware 
mechanism and the control algorithms. 

One noticeable feature of polymorphic mesh is the capability to "reshape" itself adaptive to the 
condition of the processing. As described previously, the pattern registers 97 and 98 can be loaded 
30 concurrently with the ALU operation; as a result, it can be considered that each PE has P patterns 
(P1, ,Pp) at its disposition. 

The reshaping that is adaptive to the condition C of the processing allows each PE to assume a pattern 
Pi as a function C. Initially, all PEs start from a pattern, say Pi. and processing begins under the initial 
pattern. As processing goes, each PE senses its local condition (this can be a test of convergence of a 
35 value, for example) and decides whether to stay in the current or replace it with a new pattern. 

A condition can be global, meaning that the host can sense each PE condition collectively, then decide 
the choice of pattern and feed back to all PEs. A condition can also be local. The new pattern may be fed 
to an individual PE, or to a group of PEs as appropriate. 

Choice of a new pattern can be made in several ways. 

40 

(1) prescheduled: 

a sequence of patterns (e.g. PI P2 Pp) are scheduled and will be adopted in such a priority. When 

45 the need of a new pattern is detected, (while using Pt) the next pattern P(i + 1) will be used. Then P(i + 
2) will be loaded when possible to the unused pattern register concurrent with the operation. 



(2) function of C: 

so 



55 

An example to show the adaptive reshaping is as follows: 
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A pattern of 3x3 window is established as Pi to allow filtering operation and an index is identified as 
condition C to measure the effectiveness of the filtering. The goal here is to increase the window size when 
C is not satisfactory. In this regard, we establish P2 as a 5x5 window, P3 as 7x7, P4 as 9x9 etc. 

Another example of adaptive reshaping is the "soft fair application. It is common to design a condition 
5 C in each PE such that a malfunction reflected. Such a condition then can be used to decide a new suitable 
connection pattern Pi. 



POLYMORPHIC-MESH MECHANISM 

w 

The polymorphic mesh (Figure 1) consists of an array 1 of MxM processing elements (PE). Each PE 
has four physical wires communicating with four non-diagonal neighbors, one for each neighbor. These 
communicating wires are denoted as E, W, S and N as shown for PE 4. PE 5 is shown expanded to show 
arithmetic and and logical unit (ALU) 6, memory (M) 7, and connection control mechanism (CCM) 8, ALU 6 

is and 7 are relatively standard features in image processor processing elements. The connection control 
mechanism is not standard, but rather is in the special configuration according to this invention. 

As shown in RGs 1 and 2, each PE consists of three functional blocks: the connection control 
mechanism 8, the memory block and the Arithmetic Logic Unit (ALU) block 6. The connection control 
mechanism 8 takes four wires (E, W, N and S) ALU output and memory (M) as inputs and reroute them as 

20 outputs. The routing is accomplished by "SHORT_CIRCUmng" any input A (for example, 10 in in FIG. 2) 
to any output B. The signal appears on wire A is logically as that of wire B where A and B can be any of 
the inputs to the connection control unit For example, "SHORT_WE" will logically equalize wire W 24 and 
wire E 26. 

FIG- 3 shows the functional units of CCM a The action " SHORT^CIRCUrT" is conditional and the 
26 conditions are created by the connection control mechanism shown in Figure 3. Each PE has two 1 -bit flags 
F1 31 and F2 32 for generating condition signals. Each PE is equipped with one shift register SRM 33 
which can shift in both direction logically or arithmetically and supply true and complement outputs 34 and 
35. 

Each PE contains a pair of registers X 36 and Y 37 where register X holds the PEs row position (0 S X 
30 £ M-1) and register Y holds the PE 's column position (OSYS M-1). Both register X and Y have a shift 
register SRX 38 and SRY 39. each of which can be loaded the content of X and Y respectively and can shift 
in both directions logically or arithmetically. The bit shifted out of SRX is BSRX and that of SRY is BSRY. 

Several functions are provided to generate the condition on which the "SHORT_CIRCUIT" action is 
based. 

as LOAD reg value: this function will load the "value* into SRM. F1 or F2 via instruction or memory; 
COPYSR reg: this function will copy reg X to reg SRX or Y to SRY or both 
AND/OR/XOR reg: this function performs one of the following. 
"AND/OR/XOR" X with SRX and X with SRM (or inverted SRM); 
"AND/OR/XOR" Y with SRY and Y with SRM (or inverted SRM); . 
ao both of above; 

ANDALLBIT reg: this function performs "AND" on all bits of reg; it produces one condition bit XANDALL if 
reg is X, or one condition YANDALL if reg is Y, or both condition bits. 

The "SHORT_CIRCUIT" action is then based on the combination of BSRX. BSRY. XANDALL. YANDALL. 
F1 and F2. 

45 The remaining two functional Modes of the polymorphic-mesh are rather similar to the conventional 
design. The memory block can be viewed as a storage that can deliver one bit to the connection control 
block and/or accept one bit from it per machine cycle. The ALU, although is similar to the conventional 
design, features on its "conditional" response to the combination of bits BSRX. BSRY, XANDALL and 
YANDALL in choosing a "SEND" or a "RECEIVE" action along with the "SHORT_CIRCUIT" action. 

so With the polymorphic-mesh mechanism, twelve patterns formed by the polymorphic-mesh are de- 
scribed in the following. Their corresponding control algorithms are described in order. 
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CONTROL ALGORITHMS 
(PI) Linear Array 

s As shown in Figure 4, a row linear array of length MxM and a column linear array of the same length 
can be formed from an MxM polymorphic mesh by connecting S of PE (M-1, i) to N of PE (0. i + 1) and 
connecting E of PE fi, M-1) to W of PE (i + 1. 0). The N of PE (0,0) and S of PE (M-1, M-1) are the 
beginning and the end of the column linear array respectively while the W of PE (0, 0) and E of PE (M-1, 
M-1) are the beginning and the end of the row linear array. 

to The control algorithm is as follows. At per PE cycle, 



LINEAR 0 
75 { 

MEM = W; /* action 1 */ 
E = MEM; /* action 2*/ 
MEM = N; /* action 3*/ 
S = MEM; /* action 4*/ 

25 } 



Action (1) takes the datum on W into MEM at the end of the cycle while action (2) puts the content of MEM 
(at the beginning of the cycle) on E In combination, actions (1) and (2) create the row array. The data 
injected through W of PE (0, 0) will march eastbound and after M cycles, the first row will be filled. After 
another M cycles, the data in the first row will march to the second row while the new data will fill the first 
row. Actions (3) and (4) create a column linear array in a similar fashion. 

The formulation of the linear array is unconditional. All PEs are taking the same action. The mechanism 
in the connection control mechanism is not utilized. 
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(P2) Row Tree 

M trees (one per row) can be formed from the polymorphic mesh by the following control algorithm. 

5 

ROW-TREE 0 
{ 

int t; /* t is time step*/ 

bit pidO; /* column position of a PE, Process IDO*/ 
int M, logM; /*M is the side size of the mesh and logM«log MV 
int treemask=l;/* a flag to construct the tree*/ 

20 

for (t=0; t< IogM; t++) { 
if (-.treemask) 

25 

{SHORT_WE; DISABLE;} 
ifftreemask && -,pidO<t>) 
so E = MEM; 

if (treemask && pidO<t>) 
35 MEM = W; 

treemask = treemask & pidO<t>; 

} 

40 

} 



^ Figure 5 illustrates the above control algorithm for a 8-PE case. At t=0, the "treemask "s for all PEs are 
1, therefore, every PE is enabled and the "even PEs* send data to "odd PEs". This is indicated by an 
arrow between every pair of "odd/even" PEs. This also forms the bottom level of the tree. 

At t=1, by investigating treemask, only the PEs with pidCKO = 1 (lowest bit of pidO) are enabled. The 
disabled PEs are not shown by circle but they do establish the connection between PE 1 and 3, and PE 5 
^ and 7 by the action SHORT_WE. This forms the second level of the tree. 

The highest level of the tree is formed by the control at t=2L At this step, only PE 3 and PE 7 are 
enabled; the rest PEs establish the connection by "SHORT" the W-E path but do not perform any operation 
(or equivalents do not change any state of MEM). 

An alternative view of this control algorithm is shown in Figure 6 with the nodes at each level of the tree 
55 marked by teleprocessor identification (pid) of the PEs. 
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The tree pattern is extremely useful in the paradigm of divide-and-conquer. Dedicated tree-machines 
have been built for special-purpose computing. The complexity of the algorithms in this paradigm is usually 
0(log N) where N is the size of the input data. The same algorithms usually need O(N) execution time in a 
mesh; this represents a 1024:10 speedup for N = 1024. Important algoifthms in this category include MAX, 
MIN. k-th largest median, some/none etc. 

Using the mechanism in the connection control block, the control algorithm uses register X for pidO, 
register SRX and flag F1 for "treemask". The content of X register is copied to the SRX so that pid<t> bit 
is shifted out at time step t in BSRX then ANDed with "treemask" to derive the final condition. 



(P3) Column Tree 

Similar to Row Tree, M column trees can be formed by the polymorphic mesh with the row position of 
the PEs, pidl, as the control and N-S path as the connection. The control algorithm is as follows. 



COLUMN-TREE 0 
{ 

int t; /* t is time step*/ 

int pidl; /* row position of a PE*/ 

int M, logM; /*M is the side size of the mesh and iogM=Iog M*/ 
int treemask =1;/* a flag to construct the tree*/ 



for(t=Q;t<IogM;t++)f 
if (-treemask) 

{SHORT_NS; DISABLE;} 
if(treemask && -pidl<t>) 

S = MEM; 
if (treemask <&& pid0<t>) 

MEM = N; 
treemask = treemask & pid0<t>; 

} 

1 

Column Trees are useful in transporting the data distributed column-wise in the mesh and converting 
algorithms with linear complexity (0(N)) to logarithmic complexity (0(log N» as discussed in Row Tree 
section. 

Similar to ROW_TREE, the COLUMN_TREE control algorithm uses register Y for pidl. SRY to copy 
Y, and F2 to hold Treemask". The condition to SHORTENS is based on the AN Ding of F2 and BSRY 
which produces pidl <t> at time step t 
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(P4) Orthogonal Tree 

Orthogonal Tree (FIG. 9) is a useful network for sorting, matrix operations, minimum spanning tree, FFT 
and other graph algorithms. It can be formed from the polymorphic mesh by combining the Row Trees and 
5 the Column Trees by the ORTH_JREE control algorithm below. 



70 

ORTH_TREE 0 
f 

75 

int t; /* t is time step*/ 

int pidO; /* column position of a PE, Process IDO*/ 
20 int pidl; /* row position of a PE*/ 

int M, logM; /*M is the side size of the mesh and logM^Iog M*/ 
int hmask=l f vmask=l;/* flags to construct the tree*/ 



30 



for (t=0; t< logM; t++) { 
/♦cycle 1*/ 
if hmask) 

35 {SHORT_WE; DISABLE;} 

ifOunask && -.pid0<t>) 

E = MEM; 
if (hmask && pid0<t» 

MEM = W; 
hmask = hmask & pid0<t>; 



40 



50 



/♦cycle 2*/ 
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TO 



75 



20 



if vmask) 

{SHORT_NS; DISABLE;} 
if (vmask && -.pidl<t>) 

S = MEM; 
if(vmask && pidl<t>) 

MEM = N; 
vmask = vmask & pidl<t>; 



} 



Key advantages of forming the Orthogonal Tree from the polymorphic mesh are 

(1) reduction of chip area: the chip area required to layout the mesh and the orthogonal tree are 0(N~2) and 
0((N~2)"(logN)-2) respectively where N is the side size of the mesh and the number of leafs of the 
orthogonal tree. This represent a saving at a factor of (log N)"2. For N = 1024, the chip area used by the 

25 polymorphic mesh is 1/100 of the orthogonal tree. 

(2) efficient neighborhood operations: PEs in Orthogonal Tree does not connected to its geographical 
nearest neighbors hence for image processing, many important neighborhood operations can not be 
performed efficiently because there are no direct communication paths. In fact more than half of the data in 
a 3x3 window must be passed up one level in the tree then passed down to the center of the window; the 

30 average distance among data in a 3x3 window is 2.125 as against 1.5 in the polymorphic mesh. A sample 
3x3 window for the orthogonal tree is shown in Figure 8 and 10 and the number in the circle is the distance 
between that datum and the centre of the window. 

As summarized in Figure 7 for N=4, between the polymorphic mesh and the orthogonal tree, the ratio 
of chip area is 16:46 and that of average distance in a 3x3 window is 1.5:2.1 (Figure 8 and 10). 

35 The control algorithm ORTH_JTREE uses register X for pidO, SRX to copy pidO, F1 for hmask, and 
produces pid0<t> at time step tin BSRX. Symmetrically, the control algorithm uses register Y for pidl, SRY 
to copy pidl, F2 for vmask. and produces ptd1<t> at time step t in BSRY. The conditional SHORT_WE 
and SHORTENS are based on BSRX. BSRY, F1 and F2. 

40 

(P5) Reverse-Row Tree 

RR-Tree is a top-down tree (as against row trees which are bottom-up) which can be formed from the 
top level to the bottom level of the tree, a reverse process of the row tree. The control algorithm is shown 
45 below. 
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RR-TREE 0 
{ 

int t; /* t is time step*/ 

int pidO; /* column position of a PEV 

int M, logM; /\M is the side size of the me* and logM^log MV 
int treemask=M%2;/* a flag to construct the tree*/ 
int mask; /*an intermediate condition*/ 



for (t=0; t< logM;t++){ 
33 mask = ANDALLBIT (..treemask | pidO ); 

if mask) 

25 {SHORT_WE; DISABLE;} 

if(mask && pidO<logM-t-l» 
W = MEM; 

30 

if(mask && -pidO<logM-t-l>) 
MEM m E; 

35 treemask a ASHTFT ( treemask* 1); 

} 

} 



As shown in Figure 11 by an 8-PE example, the control algorithm can be described as follows. The flag 
treemask was initialized as one half of the total number of PEs (i.e. 4 = 100). The INVERTed "treemask" is 
first "ORed" with pidO then the result is passed to ANDALLBIT which returns a T in 'mask 1 if all bits of the 
45 result are T and returns a '0' otherwise, the PEs with mask=1 (ub. PE 3 and 7) are part of the tree and the 
rest, being not a tree node, will disable themselves and SHORT their W-E path to establish the tree 
connection. For PE 3 and 7; bit 2 of pidO is further checked, a T in this bit lets PE 7 send datum to the 
receiver PE 3 whose bit 2 is V. This forms the top level of the tree att=0. 

At the end of t=0, treemask is shifted arithmetically one bit to the right It therefore becomes 110 for 
50 the next time step. 

At t=1, the same process identifies that PE 1,3,5 and 7 are the tree nodes, and PE 7 sends data to PE 
5 and PE 3 to PE 1. Treemask becomes 111 at the end of t=1. 

At t=2, every PE is a tree node and each PE with odd pidO sends data to its neighbor with lower even 
pidO. 

55 Using the mechanism in connection control block, Treemask" is loaded to SRM, and X copied to SRX. 
Register X is "ORed" with the INVERTed SRM first; the result is then "ANDALLBITed" to produce "mask". 
SRX is shift left logically to produce pidO<logNK-1> in BSRX at time step L This is used to control the 
SEND/RECEIVE action for a pair of the tree nodes. 
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The RR-Tree is mainly for propagating a datum to all tree nodes which will perform different operations 
to this datum depending on their positions in the tree. For computer graphics, this pattern is useful because 
it allows each PE to generate A*X simultaneously where A is a constant and X is pidO. Evaluating A"X in 
parallel allows the fast generation of a line; this will be further discussed in another pattern called Diagonal- 
5 Span-Tree (P1 2). 

In general, a reverse tree is used to convert a symbolic representation in a parameter space to an 
iconic representation in image space, then the algorithm is performed iconically in a massive parallelism 
available in the polymorphic mesh. 

Although the control algorithm uses the PE with the highest pidO as the root of the tree, the PE with the 
to lowest pidO can be used as the root as well and the control algorithm is of equivalent complexity. 

(P6) Reverse Column Trees (RC-Tree) 

is Similar to RR-Tree, the ROtree can be formed by using pid1 as the control and N-S as the path to 
establish the tree connection. This is shown in the following control algorithm. 
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RC-TREE 0 
{ 

int t; /• t Is time step*/ 

int pidl; /* row position of a PE*/ 

int M, logM; /*M is the side size of the mesh and logM=log M*/ 
int treemask=M%2^* a flag to construct the tree*/ 
int mask; /*an intermediate condition*/ 



for (t=0; t<logM;t++){ 

mask = ANDALLBIT treemask | pidl ); 
iff-, mask) 

4$ {SHORTENS; DISABLE;} 

iffmask && pidl<iogM-t-l» 

N = MEM; 
iffmask && -pidl<logM-t-l» 



50 



55 



16 



0 257 581 



MEM = S; 
treemask = ASHIFT ( treemask, 1); 

5 

} 

} 

70 

The properties of the ROTrees are the same RR-Trees except that RC-Trees are related to the data in 
the column of the mesh. 



15 (P7) Row-Bus 

For broadcasting purpose, a bus is a very useful pattern whose broadcasting distance is the shortest 
One bus can be formed for every row of the polymorphic mesh by the following control algorithm. 

20 

ROW_BUS 0 
{ 

25 

int sender; /* ID for the sender*/ 
int pidO; 

30 

SHORT_WE; 
if (pidO == sender) 

35 

E = MEM; 

40 else 

MEM = W; 

} 

45 

A PE in a row is designated as the "sender- and the rest of the PEs are the receivers. All PEs 
-SHORT" their E-W path to establish the bus, and the sender will send the data to E (or W) while the 
receivers can receive the data from W (or E). (In another case when a datum is injected into W of E by the 
so externa) controller, there is no "sender" PE and all PEs are the "receivers".) 

Using the mechanism provided by the connection control block, "sender" is loaded to SRM. The 
"INVERTed" SRM is "XORed" with register X which stores pidO; the resulting bits are "ANDALLBJTed". A 
T in XANDALL identifies the PE as a sender and those PEs with XANDALL=0 are the receivers. 

55 
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(P8) Column Bus 

Simitar to Row Bus, a Column Bus can be formed for each column of the polymorphic mesh by using 
pid1 as the control and N-S as the path as shown in the following control algorithm. 

5 

COLUMN_BUS 0 
{ 

70 

int sender; /* ID for the sender*/ 
int pidl; 

75 



SHORTENS; 
if (pidl == sender) 
S = MEM; 



MEM a N; 

} 



The property of the column bus is the same as the row bus. 

In combination, the row bus and the column bus can be used to broadcast a common datum to all PEs 
in the mesh in two steps. At the first step, the common datum can be broadcasted to all PEs in the top row; 
then at the second step, the PE at the top row can broadcast the common datum to all other PEs along 
column direction. 



(P9) Pyramid 

Pyramid configuration is powerful in image processing and computer vision mainly because of its 
capability in handling mum-resolution Images. This pattern can be formed from the mesh by the following 
control algorithm: 
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PYRAMID 0 

{ 

int t; /* t is time step*/ 

int pidO; /* column position of a PE*/ 

int pidl; /* row position*/ 

int M, IogM; /*M is the side size of the mesh and logMolog M*/ 
int hmask=l, vraask=l; /*rwo flags to construct the pyramid*/ 

for(t=0; t<logM;t++) { 
/*cycle 1 action*/ 

if(-»hmask | -.vmask) 

{SHORT_WE; SHORTENS; DISABLE;} 
if(hmask && vmask && «.pid0<t> && -pidl<t>) 
E = MEM; 

ifOiraask && vmask && pid0<t> && -pidl<t» 

{N = MEM; MEM1 = W;} 
iffhraask && vmask && -,pid0<t> && ptdl<t>) 

E = MEM; 

iffbmask && vmask && pid0<t> && pidl<t>) 
{MEMO = N; MEM2 = W;] 
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/•cycle 2 action*/ 

if(^hmask I -,vmask) 

{SHORT_WE;SHORT_NS; DISABLE;} 
if (hmask && vmask && -pidO<t> && ->pidl<t» 

NO_ACTION; 
if (hmask && vmask && pid0<t> && -pidl<t>) 

S = MEM1; 
if (hmask && vmask && -.pid0<t> && pidl<t>) 

NO_ACT10N; 
if (hmask && vmask && pid0<t> && pidl<t» 

MEM1 =N; 



hmask = hmask & pid0<t>; 
vmask = vmask & pidl<t>; 

} 

} 

The control algorithm consists of log M steps and within each step there are two control cycles. In 
another word, each step forms a level of the pyramid in two PE cycles. 

Figure 12, 13 and 14 depict the pyramid control algorithm by a sample 8x8 mesh. Two masks, hmask 
(for row) and vmask (for column), are initialized as 'V such that all PEs in the mesh are 'en abled* at the 
first time step. At t=0, all PEs are active and every 2x2 PEs are formed as a group. These four (2x2) PEs 
are the NW, NE, SW and SE sons of the pyramid and the parent is the same as the SE son. The activity of 
the four sons are distinguished by the pid(Kt> and pid1<t> bits. The SE son (or the parent) being 
designated as pid0<0> = pid1 <0> = 1 , will receive data from the SW (pid0<0>=0 and pid1<0>=1) and NE 
(pfd0<0> = 1 and pkJ1<0> =0) sons at the first cycle. In this cycle, the NW son enroutes its data to the NE 
son; this data will be received by the parent at the second cycle. At the second cycle, only the NE son and 
the parent are involved in sending and receiving; the other two PEs have no action. Both vmask and hmask 
are updated to control the connection of next time step. 

At t= 1, again four PEs form the four sons and one parent of the next-level pyramid. But these four PEs 
span in a 4x4 mesh as shown in Figure 13. The activity of four sons and the parent is the same as t=0 
except that PEs at even rows or even columns are disabled. These disabled -non-pyramid 1 * PEs SHORT 
their W to E fine, and N to S line to establish the pyramid connection. 

PEs that constitute the last-level pyramid are shown in Figure 14. Their activities are the same as the 
previous two steps. 

Orthogonal to the pyramidal structure described above, the pyramid pattern has a mesh connection at 
each level. That is for each node in the pyramid there exist four neighbors (N, S, E, W) at the same level 
other than four sons at the level below and one parent at the level above. This relation is shown in Figure 
15. 
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The control algorithm for obtaining the neighbors at the same level has been imbedded in the above 
pyramid control algorithm. For example, at t=0 as shown in Figure 8a, the neighbors at the same level are 
connected by the original mesh. At t=1, the neighbors at the same level are scattered in every other row 
and column and the mesh connection for them has been established by the above-mentioned control 
5 algorithm PYRAMID. 

To obtain the content of its neighbors at the same level of the pyramid, two control cycles are added to 
every step of the control algorithm PYRAMID as follows. Cycle 3 is to obtain content of N and W while 
cycle 4 is to obtain that of S and E. 
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PYRAMID 0 
{ 

int t; /* t is time step*/ 

int pidO; /* column position of a PE*/ 

int pidl; /* row position*/ 

int M, logM; /*M is the side size of the mesh and logM=log M*/ 
int hmask=l, rmask=l; /*two flags to construct the pyramid*/ 



for 0=0; t< logM; t++) { 
/•cycle 1 action*/ 
40 if(-ihmask | -rroask) 

{SHORT_WE; SHORT_NS; DISABLE;} 
iffhmask && vmask && -pidO<t> &c& -,pidl<t» 
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E = MEM; 

if (hmask && vmask && pid0<t> && -pidl<t>) 

{N = MEM; MEM1 = W;} 
if(hmask && vmask Sc& -pidO<t> && pidl<t>) 

E = MEM; 

if (hmask && vmask && pidO<t> && pidl<t>) 
{MEMO = N; MEM2 « W;} 

/♦cycle 2 action*/ 

if hmask | -vmask) 

{SHORT_WE; SHORT_NS; DISABLE;} 
if(hmask && nnask && -pid0<t> && -pidl<t>) 

NONACTION; 
if (hmask && vmask && pid0<t> && -pidl<t>) 

S = MEM1; 
if (hmask && vmask. && -pid0<t> && pidl<t>) 

NONACTION; 
if (hmask SlSl vmask SlSl pid0<t> SlSl pidl<t>) 

MEM1 = N; 

/•cycle 3 action*/ 

if (-hmask | -vmask) 
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{SHORT_WE; SHORTENS; DISABLE;} 
if (h mask && vmask){ 
S = MEND; 
E tm MEM4; 
MEM3 ss N; 
MEM4 = W; 

} 



2Q /*cycle 4 action*/ 

if (-.hmask | -.vmask) 



{SHORT_WE; SHORT_NS; DISABLE;} 
if (hmask && vmask){ 
N = MEMS; 

so W = MEM6; 

MEMS » S; 
MEMSsE; 

} 



hmask = hmask & pid0<t>; 
vmask = vmask & pidl<t>; 



} 

Using the mechanism in the connection control block, hmask and vmask are loaded to F1 and F2 
respectively. The pidO in register X and pkJ1 in register Y are copied to SRX and SRY respectively. Shifting 
logically to the right BSRX and BSRY will contain pidOO and ptd1<t> at time step t These two condition 
bits along with F1 and F2 are used to implement the SHORT actions. 

The above-described pyramid has a base of MxM and a shrinkage of 2. meaning that the level above 
the base has M/2 x M/2 PEs and so on. The PYRAMID control algorithm can be extended to handle any 
shrinkage K, where K is a power of 2, by updating hmask and vmask by 
hmask * hmask & pidOO & pkKKt+ 1> 
vmask * vmaskapid1<t> &pid1<t+1> 
and by skipping the pyramid node actions on every odd t step. 
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(P10) Reverse Pyramid 

Information flows from bottom to top level in a pyramid for iconic to symbolic conversion. However, 
there is need for the information to flow in the opposite direction for symbolic to iconic conversion. This can 
be served by the Reverse Pyramid (R-Pyramid) formed from the polymorphic mesh by the following control 
algorithm. 



R-PYRAMID 0 
{ 

int t; /* t is time step*/ 

int pidO; /* row position of a PE*/ 

int pidl; /* column position of a PE*/ 

int M, logM; /*M is the side sfce of the mesh and IogM=log M*/ 
int mask=M%2;/* a flag to construct the pyramid*/ 
int hmasky Ymask; 

for (t=0: t< logM; t++) { 

hmask = ANDALLBIT ( -mask | pidO ); 
▼mask = ANDALLBIT ( -mask I pidl ); 

/*cyde 1*/ 

if (-hmask | -Ymask) 

{SHORT_WE; SHORT_NS; DISABLE;} 
iffhmask && Ymask && pidO<logM-t-l> ScSc pidl<logM-t-l» 
1 N = MEM2; 

iffhmask && Ymask && pidO<iogM-t-l> && -pidl<logM-t-l» 
MEM2 = S; 

if (hmask && vmask && -pidD<logM-t-l> && pidl<logM-t-l» 
NO_ACTION; 

iffhmask && ymask -pidO<logM-t-l> && -pidl<logM-t-l>) 



24 



0 257 581 



10 



20 



26 



NO-ACTION; 

/•cycle 2*/ 

if(-hmask| -vmask) 

{SHORT_WE; SHORTENS; DISABLE;} 
if(hmask && vmask && pidO<logM-t-l> && pidl<logM-t-l» 
n {N = MEM1; W = MEM3;} 

iffhraask && vmask && pidO<logM-t-l> && -.pidl<logM-t-l» 

{MEM1 = S; W = MEM2;} 
iffhraask && vmask && ->pidO<logM-t-l> && pidl<logM-t-l» 
MEM3 a E; 

iffhraask && vmask && -.pidO<logM-t-l> && -pidl<IogM-t-l» 
MEM2 = E; 

mask =r ASHOT ( mask, 1); 

I 

35 

} 

The control algorithm for the R_PYRAMID is a reverse process of the PYRAMID control algorithm and 
40 is an expansion of the RR_TREE and RC_THhfc> 

One half of the mesh size (the "mask") is loaded to register SRM. while pidO in X and pid1 in Y are 
copied to SRX and SRY respectively. The "INVERTed" SRM is "ORed" with X then "ANDALLBITed" to 
produce a flag "hmask" in XANDALL. Similarly, "vmask" is produced in YANDALL Along with these two 
condition bits, pidO and pidl are shifted logically from left to right to produce pidO<logM-M> and 
45 pid1<logM-t-1> at time step t in BSRX and BSRY respectively. Rgure 16, 17 and 18 illustrate the forming 
of a pyramid in an 8x8 polymorphic pyramid. 

In a similar way as PYRAMID, each step of the R-PYRAMID control algorithm consists of two cycles: 
the first cycle is an intermediate stage of sending data to the NW son (data are routed to the NE son in the 
cycle and to NW son at the second cycle); and at the second cycle while the NE son is routing the data to 
50 the NW son, the parent sends data to the NE and SW sons at the same time. 

Connection for the neighbors at the same level of the pyramid can also be established by the R- 
PYRAMID control algorithm similar to PYRAMID. However, communication at the same level is not used in 
general for information flowing top-down. When it is necessary, actions similar to cycle 3 and 4 of the 
PYRAMID control algorithm can be used. 
55 The above-described pyramid has a base of MxM and a shrinkage of 2, meaning that the level above 
the base has M/2 x M/2 PEs and so on. The R-PYRAMID control algorithm can be extended to handle any 
- shrinkage K, where K is a power of 2, by initializing SRM by M/flogK) and shifting SRM log K bits per step. 
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(P11) Cube 

Cube is the most natural extension of the polymorphic mesh into a 3D structure. The usefulness of 
cube to 3D data structure (e.g. 3D image element = volume = voxel) can be analogously described as the 
5 usefulness of the mesh to 2D data structure (e.g. 2D picture element = area = pixel). 

To form the cube from the mesh, the data structure in the third dimension is sliced vertically and 
allocated to one PE such that the communication in the third dimension can be accomplished by the local 
memory communication while the communication involving the other two dimensions are served by the 
mesh. In fact, the polymorphic feature is not used in forming this pattern and the cube formation may not 
to be as novel as the other eleven patterns. Nevertheless, the cube pattern is supported by the polymorphic 
mesh with the saving of the connection pins in the third dimension. Since the saving is significant (2xMxM 
pins in total), the forming of the cube from the polymorphic mesh is important to its VLSI implementation. 

With the data slicing, and MxMxK cube can be formed, where K is an integer and the value of K is only 
limited by the amount of local memory in the PE 

75 

(P12) Diagonal-Span Tree (DST) 

A Diagonal-Span-Tree (DST) is a binary tree whose leafs span the diagonal of the mesh once and Only 
20 once. By this definition, the DST in an NxN mesh has N leafs each of which occupies a diagonal node 
designated as PEflc N-1-k), k=0 to N-1. There are many possible DSTs in a mesh. We choose the one 
shown in Rgure 19 (exemplified by a 3-tevel DST in an 8x8 mesh) because it is simple to control. 

As shown in Figure 19, the root of the DST is at PE(0, 0) (upper left comer of the mesh). The left son of 
the root is four units away vertically <LePE(4, 0)) while its right son is four units away horizontally (Le.PE(0, 
25 4)). 

The second level sons are two units away from the corresponding first level sons vertically and 
horizontally. Thus PE (6, 0) and PE(2, 4) are the sons of PE(0. 4), and PE(4, 2) and PE(6, 0) are the sons of 
PE(4, 0). 

Spanning in a similar way. all diagonal PEs of the mesh are the third level sons. 
30 In a general definition, a Upper Left DST (ULDST) in a NxN mesh has PE(0, 0) as the root and the 
diagonal PEs (k, N-1-k), k=0 to N-1, as the leafs. The i-th level (i=1 to logN) left son of PE(s, t) is PE- 
(s+JT(logN - i). t) and the i-th level right son of PE(s, t) is PE(s, t+2~(logN-i)). 
The control algorithm for ULDST is fisted below. 

35 



ULDST 0 



int fs=0, fr=0; /*flag-send is used to construct the DST*/ 



so 
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/*flag-receive is an intermediate var to update fsV 
inl pidO, pidl; 

5 

int t; /* t is time step*/ 

int M, logM; /*M is the side size of the mesh and logM=log M*/ 
_int treemask=M%2;/* a flag to construct the tree*/ 

15 if (pid0==0 && pidl=*=0) { 

fs = 1; fr = 1;} 

20 

for(t=0;t<logM;t++){ 

hmask = ANDALLBIT ( -.treeraask | pidO ); 

25 

mask s ANDALLBIT ( -treemask | pidl ); 
if (-.hmask | -vmask) 
30 }SHORT_WE; SHORTENS; DISABLE;} 

if (hmask | vmask && fr) 
{E = fs; S m fs; fr = 0;} 

35 

if (hmask | rmask &&. -.fr) 
{fs = W | N; fr = W | N; } 
40 treemask = ASHIFT ( treemask, 1); 

} 

» 

Using an 8x8 polymorphic mesh as an example, the ULDST algorithm can be explained as follows. A 
root PE(000, 000) is selected for the DST and Its fe and fr are set to 1. At t=0 time step, rows 000 and 100, 
and columns 000 and 100 are active white the rest are disabled. The disabled PEs short their WE and NS 

50 to establish a temporary path for updating fs. The active PE with fr=1 sends its fs value to E and S 
neighbors, then reset fr to 0 so that it will not be a sender at the next time step. The receivers (with fr=*0) 
updates its fs as the ORed value of N and W. The receivers also update its fr with the same ORed result 
therefore the PE just received a 1 from N or W will be the sender at the next time step. Step t=0 selects 
two PEs (PE(000, 100) and (100,000)) as nodes of DST (via setting, their fs = 1) furthermore this step 

55 prepares them as the new senders (via setting fr « 1) to set more DST nodes in the following step. 

At the next step, each of the two new senders will produce, two DST nodes and two senders in a similar 
way. The new nodes and senders are PEs (000, 110), (010, 100), (100, 010) and (110, 000). 



27 



0 257 581 



At t= 2, the diagonal PEs are reached by the control algorithm; their fs will be set to 1 to identify 
themselves as part of the DST. Furthermore, their fr will be set to 1, which is additional information to 
identify themselves as diagonal nodes. The diagonal identification is a bonus from the DST algorithm: It is 
useful for many types of computing to be discussed in the next section. 
5 To form an DST. the flag fs is used as the condition: PEs with fs =0 short We and NS paths while PEs 
with fs=1 send MEM to E and S, and receive data from W and N. 

Using the mechanism in the connection control block, "treemask" is loaded to SRM, fs to F1 and fr to 
F2. The INVERTed SRM is ORed with pidO in register X and pid1 in register Y respectively; the "ORed" 
results are "ANDALLBITed" to produce "hmask" in XANDALL and "vmask" in YANDALL At the end of 
io each step. SRM is shifted arithmetically one bit to the right The SHORT action is then based on F2, 
XANDALL and YANDALL. 

By choosing a different root and using a similar control algorithm, different DST can be form in the 
same polymorphic mesh. Figure 20 shows a DST with the root in the lower right comer; it is called URDST. 

In the following we show that the coexistence of ULDST and LRDST allows us to compute A*X+B"Y+C 
75 in parallel for each pixel (x, y) in an image. This capability has a very wide application^ in computer 
graphics and computer vision. 



APPLICATIONS 

20 

Besides the well-known application of a plain mesh to image processing the following applications of 
polymorphic mesh are either faster in polymorphic mesh or not-implemented in the plain mesh. These 
applications are categorized Into six types. 

25 

(1) DMde-and-conquer computing 

This type of computing involves in dividing a set of N data into two groups at first according to their 
property. Then by applying the same property, each group is further divided into two subgroups. This 
30 process Is repeated until each group contains only one datum. 

For mesh connection, this type of computing has the complexity of O(N) or higher. By transforming to 
trees and pyramids, the complexity of this type of computing is 0(log N) in the polymorphic mesh. The 
speedup for a data set of N = 1024 is 100:1 . a two orders of magnitude improvement 

Computations belonging to this type include sorting, find maximum, minimum, k-th largest and median. 
35 All these algorithms are of complexity Oflog N). 



(2) Iconic-to-symboftc Conversion 

40 This type of computing is specific to computer vision and is often called intermediate level processing. 
Given an image, we are interested to know 

(a) how many pixels satisfy a specified property; 

(b) which pixels satisfy the specified property; 

(c) are SOME or NON or ALL pixels satisfy the specified property; 
45 The property for the above can be 

(a) equal to a value; 

(b) greater than a value 

(c) smaller than a value 

(d) condition synthesized arithmetically and logically from above 

50 All these algorithms can be computed in OflogN) steps in polymorphic mesh by tree and pyramid 
patterns. More importantly and in significant contrast to the conventional fixed-pattern approach, only the 
answer (as against the whole intermediate image) is output This significantly reduces the I/O rate; in the 
extreme case, only one bit (YES/NO) as against 1024x1024 bits (the whole image) is output 

Related to I/O. extra N-S path has been traditionally added to the mesh to support concurrent I/O and 

55 processing. This mechanism and benefit are also valid for the polymorphic mesh, however, is irrelevant to 
the invention. 
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(3) Statistic Measurement 

The polymorphic mesh is capable of computing the following statistics in 0(log N) steps. The statistics 
include. 

(a) mean, variance, standard deviation; 

(b) area, perimeter and centroid 

(c) first moment second moment and cross moment 

Item (a) is general to a set of N data while Item (b) and (c) are specific to an image. 
The statistics are foundations of other algorithms- In computer vision, they are the basis for region 
analysis and pattern recognition. 



(4)Compute AV+fly+C 

To compute AN+B"y + C. four patterns need to be formed by the polymorphic mesh : they are one 
Upper-Left-Diagonal-Sparv-Tree (ULDST), one Lower-Right-DST (LRDST), Row-Buses and Column-Bus- 
es.The ULDST and LRDST must coexist to compute A*X+C and B"Y simultaneously while the Row-Buses 
and the Column-Buses are coexistent to do the summing (e.g. A*X+ C + B*Y). 

The algorithm is performed in a bit-serial manner. The extra two trees in the pixel-plane can be 
eliminated. Constant arguments A and B are broadcast to all PEs before the computing begins and are 
stored in array A and B with (k>gM-1) "0"s preempted at the beginning of the arrays. The storage of A and 
B is bit-reversed so that after the preempted "(Ts are accessed, the lease significant bit of A and B will be 
accessed first The constant argument C is injected into the polymorphic mesh through W of the root of the 
ULDST one bit per time step starting from the least significant bit Using the ULDST as the tree to compute 
A"X+C, each PE has three variables (sum, carry and delay). At each time step, "sum" Is passed Eastbound 
and "delay" is passed Southbound while each PE performs two operations (a) add N with array A, store the 
carry bit in "carry" and (b) store N in "delay". After logM steps, the diagonal PEs of the mesh (or the leafs 
of ULDST) stores A* for the corresponding row. Similarly, the computing of B*Y can be done by LRDST 
with "0" injected from E of the root and with each PE passes "delay" Northbound and "sum" Westbound. 
After logM steps, the diagonal elements store B"Y for each corresponding column. 

After obtaining A*X+C in rows and B*Y in columns, the polymorphic mesh changes to Row-buses in 
WE path and Column-buses in NS path. Each PE then adds the value on Row-bus to value on Column-bus 
to produce A"X+B"Y+C in bit serial fashion. 

Since there is a conflict of resource in establishing the DSTs and the buses simultaneously, the result 
of A"X+B"Y+C is delivered in bit serial at every other time step. 



(5) Fast Line Detection 

With the capability of computing ATC+B^Y+C in every two time steps of the polymorphic mesh, we can 
have every pixel PC Y) in an MxM image to decide whether It is on a given line determined by A, B and C. 
Assume that all numbers are K bits long, the decision can be made in logM+ 2K time steps. 

The capability of fast line detection is very useful in computer graphics and computer vision. For 
computer graphics, it is useful in displaying convex polygons, in creating shadow, In clipping, in drawing 
spheres, in computing adaptive histogram equalization, in texture mapping and anti-aliasing. For computer 
vision, such a capability is useful in computing Fast Hough Transform for detecting lines in a noisy image. 



(6) Converting Symbolic Information to Iconic Information 

With the massive parallel hardware available in the polymorphic mesh, it is advantageous to convert a 
symbolic processing (usually not done in mesh) into iconic processing so that the processing can be done 
in a massively parallel way. The Fast Hough Transform mentioned above is one such exampla The "mask 
generating" to be described is another class applications in this category. 
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(6.1) Band Mask Generation 

A "band mask" is confined within two parallel lines, one of which is determined by (A, B,.C1) and the 
other by (A, B, C2). To generate a "band mask", each PE computes A*X+B*Y+C1 and A*X+ B"Y+C1 +- 
5 (C2-C1) as described above. The computing produces St (the sign of A*X+B*Y+C1) and S2 (the sign of 
A - X+B*Y+C2). both of which are used in deciding whether pixel (X, Y) is inside the band. 

The "band mask" provides a capability to a computer vision system in processing only the region of 
interest The human vision adopts a different strategy in generating masks. The "s trat egy" is a symbolic 
information but its processing is actually done iconically as described. 



(62) Polygonal Mask Generation 

The "polygonal mask" is a generalization of the band mask. It consists of the union of P half planes, 
T5 each of which is determined by a line specified by A'X+B'Y+C. Using the line detection capability, we can 
obtain signs S1, S2 to SP for the corresponding lines. The Boolean combination of SI to Sp determines a 
pixel (X, Y) is inside the polygon. 



20 CONCLUSIONS 

The preferred embodiment of the invention carries out the following transforms under control of the 
connection control mechanism: 

The physical MxM mesh connection to to one row and one column linear arrays, each MxM long. 
25 The physical MxM mesh connection to M row trees, each of which has M leaves. 
The physical MxM mesh connection to M column trees, each of which has M leaves. 
The physical MxM mesh connection to a MxM orthogonal tree. 

The physical MxM mesh connection to M reverse row trees, each of which has M leaves. 
The physical MxM mesh connection to M reverse column trees, each of which has M leaves. 
30 The physical MxM mesh connection to M row buses, each of which has M PEs. 
The physical MxM mesh connection to M column buses, each of which has M PEs. 
The physical MxM mesh connection to a pyramid with MxM base and a shrinkage K where K is the power 
of Z 

The physical MxM mesh connection to a reverse pyramid with MxM base and a shrinkage K where K Is the 
as power of 2. 

The physical MxM mesh connection to a MxMxK cube, where K is an Integer and is only limited by the 
local memory of PE. . 

The invention carries out the following transform under control of programming outside the connection 
control mechanism: 

40 The physical MxM mesh connection to a DST tree (whose root can be at any corner of the mesh). Up to 
two DSTs can coexist if their roots are at the opposite comers of a diagonal. 

The physical MxM mesh connection to a MxM orthogonal tree in 0(M*2) silicon area. A saving at a factor of 
0((log M)"2) is obtained by the invention, where M is the side size of the orthogonal tree. 
The class of divide-and-conquer algorithms of linear, 0(M). complexity into logarithm, Oflog M), complexity. 
45 A saving of M/IogM is obtained by the invention. Such class of algorithms is discussed m the description of 
the preferred embodiment 

The iconic-to-symbofic conversion (intermediate level processing) occurs within the polymorphic mesh such 
that the I/O is significantly reduced. In an extreme case as discussed in the description of the preferred 
embodiment of the invention, a reduction of six orders of magnitude is obtained. ? 

50 Symbolic representation can be transformed into iconic representation by the above patterns such that the 
processing can be performed iconically in massive parallelism available in mesh. Such a feature expands 
the capability of mesh into the domain of symbolic processing. 
The mesh system Is capable of: 

(a) performing iconic processing, 

55 (b) converting iconic information to symbolic information, 

(c) converting symbolic information to iconic information and 

(d) performing symbolic processing in its iconic equivalence. 
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The invention allows the computing of A"*+B*y+C in an MxM mesh in 0(log M) steps for every pixel 
(x, y) in parallel where A, B and C are constant integers. 

The invention allows the detection for every pixel (x, y) in an MxM image whether the pixel (x, y) is (a) 
on the line (b) to the right of a line or (c) to the left of a line in 0(log M) step. 
5 The invention allows the parallel detection for every pixel (x, y) of an MxM image whether the pixel is 
inside or outside of a band where the band is formed by two parallel lines. 

The invention allows the parallel detection for every pixel (x, y) of an MxM image whether the pixel is 
inside or outside of a polygon. 

The invention can be generalized for 3D mesh (physical cube) to form the higher-dimension-extension 
10 of the twelve patterns by adding a register Z, a shift register SRZ and a flag F3 to the connection control 
mechanism, (e.g. The higher-dimension extension of a cube is a 4D hypercube.) 

The 3D image element is the voxel. The voxel is the volume element analogous to the 2D pixel, which 
has area only. The 3D extension of the invention allows the parallel detection for each voxel (x. y, z) 
whether the voxel is inside or outside of 
75 (a) a region formed by two parallel planes, 

(b) a polyhedron, or 

(c) whether the voxel is on, to the left or to the right of a plane. 

The invention applies the concept of "polymorphic" to a physical mesh via a " connection control 
mechanism." The same concept and mechanism can be generalized to other physically-fixed-connections. 

20 The pattern formed by the polymorphic mesh can be adaptive to the nature of the data by loading the 
F1 and/or F2 registers via the output of ALU. 

Arbitrary patterns can be formed by the polymorphic mesh by setting F1 and F2 registers via 
instruction or memory. The instruction, the memory value, intermediate processing values from a neighbor- 
ing processing element, and diagnostic information within the processing element are representative system 

25 operation parameters which may be used to set the flag registers, by well known techniques and simple 
means not shown. The connection control mechanism thus comprises flag register means, which flag 
register means is settable as a function of system operation parameters, to provide control information 
usable in setting a new pattern into at least one of the pattern registers. This capability, accessible to the 
programmer, permits the programmer to set up adaptation to data-related and condition-related future event 

30 possibilities. Upon occurrence of such an event a flag register is set and a new pattern fetched or 
calculated in response. 

Thus, while the invention has been described with reference to a preferred embodiment it will be 
understood by those skilled in the art that various changes in form and details may be made without 
departing from the scope of the invention. 

35 

Claims 

1- An optimizable reconfiguraWe array processing system for performing under programmable control a 
40 series of programmabty defined tasks upon input images, having differing optimal system configurations as 
a composite function of task definition and image input so that at differing times under differing conditions 
there are identifiable differing optimal configurations, comprising: 

system control means (3) including operator input means, image input means and operation control means, 
an array of polymorphic mesh processing elements (2), each comprising: 
45 a memory (7); 

an ALU (6) connected to said memory; and 

connection control mechanism (8), with a finite number of simple connection paths to said ALU (6) and to a 
related subset of said array of polymorphic mesh processing elements (2), with means (10) to form 
selective internal and external interconnections of the polymorphic mesh processing element according to a 

so connection control pattern, and means (21 ,22,23) to provide a connection control pattern. 

2. An optimizable reconfigurabie array processing system according to Claim 1, wherein said connec- 
tion control mechanism comprises pattern presentation means (21,22) making available to the processing 
element a surplus of pattern bit values defining a standard set of pattern values and an alternate set of 
pattern values, and pattern value selection means (23) for selecting said standard set of pattern values or, 

55 alternatively, for selecting said alternate set of pattern values, 

and said connection control mechanism also comprises switching means (10) responsive to the selected 
pattern bit values. 
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3. An optimizable reconfigurable array processing system according to Claim 2, 
wherein said connection control mechanism (8) comprises a plurality of pattern registers (21,22). including a 
standard pattern register and an alternate pattern register, and a crossbar switch (10), with external 
connections to neighboring processing elements, with internal connections to said plurality of pattern 
s registers and to said ALU (6), and with control connections to said plurality of pattern registers; and 

wherein said pattern presentation means comprises pattern register selection means (23), for selecting one 
pattern register; 

and said means controlling said connection control mechanism in accordance with an optimization pattern 
includes gate switching means (16-20) responsive to the setting in the selected pattern register. 
w 4. An optimizable reconfigurable array processing system according to Claim 3, wherein said connec- 
tion control mechanism comprises two pattern registers (21,22), and said pattern register selection means 
for selecting one pattern register is a binary device (23): 

5. An optimizable reconfigurable array processing system according to Claim 3, wherein said connec- 
tion control mechanism comprises flag register means (31,32), said flag register means being settable as a 

75 function of system operation parameters, to provide control information usable in setting a new pattern into 
at least one of said pattern registers. 

6. A dynamically optimizable reconfigurable array processing system comprising system control means 
including operator input means, image input means, operation control means and operation monitoring 
means controlling overall system operation, simultaneously monitoring so as to determine optimal system 

20 configuration as a composite function of operator input means and operation monitoring means, providing a 
signal defining an optimal configuration selection; and 

an array of polymorphic mesh processing elements, each having a memory and an ALU, and each having a 
finite number of connections paths to related polymorphic mesh processing elements, 
each having a polymorphic mesh connection control block having capability to form selective interconnec- 
ts tion of the polymorphic mesh processing element by short-circuiting selected connection paths, and by 
selecting a logical connective, and 

means connecting said optimizing reconfiguration control signal to said polymorphic mesh connection 
control block. 

7. A processing element for a dynamically optimizable reconfigurable array processing system compris- 
30 ing a multiplicity of processing elements generally controlled by a host computer to carry out image 

processing as a network, comprising: 
memory means; 
processing means; 
I/O connection means; 
35 internal connection means; 

connection control means controlling the relationships of sad other means in accordance with a control 
pattern; and 

means to alter the control pattern in sad connection control means. 

8. A processing element for a dynamically optimizable reconfigurable array processing system accord- 
40 ing to Claim 7, wherein said means to alter the control pattern in said connection control means is means 

responsive to intermediate level processing in a related processing element 

9. A processing element for a dynamically optimizable reconfigurable array processing system accord- 
ing to Claim 7, wherein said means to alter the control pattern in said connection control means is means 
responsive to fault indication in a related processing element 

45 10. A processing element for a dynamically optimizable reconfigurable array processing system 
according to Claim 7, comprising in addition external memory data connection means (EMD 102) 
connecting to said memory means, processor means and connection control means (8). 

11. A processing element for a dynamically optimizable reconfigurable array processing system 
according to Clam 10, comprising in addition a multiplexer and a plurality of bus connections for said 

50 external memory data connection to said memory means, said processing means and said connection 
control mechanism, 

whereby said processing element may be operating local memory and external memory simultaneously. 
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FIG. 7 
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FIG. 16 
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FIG. 19 
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FIG. 20 
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